Search CORE

15 research outputs found

TCBERT: A Technical Report for Chinese Topic Classification BERT

Author: Chen Xinyu
Fan Yuchen
Gan Ruyi
Gao Xinyu
Han Ting
Pan Kunhao
Song Dingjie
Zhang Jiaxing
Publication venue
Publication date: 21/11/2022
Field of study

Bidirectional Encoder Representations from Transformers or BERT~\cite{devlin-etal-2019-bert} has been one of the base models for various NLP tasks due to its remarkable performance. Variants customized for different languages and tasks are proposed to further improve the performance. In this work, we investigate supervised continued pre-training~\cite{gururangan-etal-2020-dont} on BERT for Chinese topic classification task. Specifically, we incorporate prompt-based learning and contrastive learning into the pre-training. To adapt to the task of Chinese topic classification, we collect around 2.1M Chinese data spanning various topics. The pre-trained Chinese Topic Classification BERTs (TCBERTs) with different parameter sizes are open-sourced at \url{https://huggingface.co/IDEA-CCNL}

arXiv.org e-Print Archive

CMB: A Comprehensive Medical Benchmark in Chinese

Author: Chen Guiming Hardy
Chen Zhihong
Jiang Feng
Li Haizhou
Li Jianquan
Song Dingjie
Wan Xiang
Wang Benyou
Wang Xidong
Xiao Qingying
Zhang Zhiyi
Publication venue
Publication date: 17/08/2023
Field of study

Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. It is worth noting that our benchmark is not devised as a leaderboard competition but as an instrument for self-assessment of model advancements. We hope this benchmark could facilitate the widespread adoption and enhancement of medical LLMs within China. Check details in \url{https://cmedbenchmark.llmzoo.com/}

arXiv.org e-Print Archive

AceGPT, Localizing Large Language Models in Arabic

Author: Alharthi Abdulmohsen
An Bang
Chen Junying
Chen Zhihong
Cheng Hao
Huang Huang
Li Haizhou
Li Jianquan
Liu Ziche
Song Dingjie
Sun Ruoyu
Sun Xuening
Wan Xiang
Wang Benyou
Xu Jinchao
Yu Fei
Zhang Lian
Zhang Zhiyi
Zhu Jianqing
Publication venue
Publication date: 22/09/2023
Field of study

This paper explores the imperative need and methodology for developing a localized Large Language Model (LLM) tailored for Arabic, a language with unique cultural characteristics that are not adequately addressed by current mainstream models like ChatGPT. Key concerns additionally arise when considering cultural sensitivity and local values. To this end, the paper outlines a packaged solution, including further pre-training with Arabic texts, supervised fine-tuning (SFT) using native Arabic instructions and GPT-4 responses in Arabic, and reinforcement learning with AI feedback (RLAIF) using a reward model that is sensitive to local culture and values. The objective is to train culturally aware and value-aligned Arabic LLMs that can serve the diverse application-specific needs of Arabic-speaking communities. Extensive evaluations demonstrated that the resulting LLM called `AceGPT' is the SOTA open Arabic LLM in various benchmarks, including instruction-following benchmark (i.e., Arabic Vicuna-80 and Arabic AlpacaEval), knowledge benchmark (i.e., Arabic MMLU and EXAMs), as well as the newly-proposed Arabic cultural \& value alignment benchmark. Notably, AceGPT outperforms ChatGPT in the popular Vicuna-80 benchmark when evaluated with GPT-4, despite the benchmark's limited scale. % Natural Language Understanding (NLU) benchmark (i.e., ALUE) Codes, data, and models are in https://github.com/FreedomIntelligence/AceGPT.Comment: https://github.com/FreedomIntelligence/AceGP

arXiv.org e-Print Archive

Assessment of treatment response during chemoradiation therapy for pancreatic cancer based on quantitative radiomic analysis of daily CTs: An exploratory study.

Author: Beth Erickson
Cheng Zheng
Diane Schott
Dingjie Li
Hui Wu
Kiyoko Oshima
Paul Knechtges
William Hall
X Allen Li
Xiaojian Chen
Yalan Tao
Yingqiu Song
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

In an effort for early assessment of treatment response, we investigate radiation induced changes in quantitative CT features of tumor during the delivery of chemoradiation therapy (CRT) for pancreatic cancer.Diagnostic-quality CT data acquired daily during routine CT-guided CRT using a CT-on-rails for 20 pancreatic head cancer patients were analyzed. On each daily CT, the pancreatic head, the spinal cord and the aorta were delineated and the histograms of CT number (CTN) in these contours were extracted. Eight histogram-based radiomic metrics including the mean CTN (MCTN), peak position, volume, standard deviation (SD), skewness, kurtosis, energy and entropy were calculated for each fraction. Paired t-test was used to check the significance of the change of specific metric at specific time. GEE model was used to test the association between changes of metrics over time for different pathology responses.In general, CTN histogram in the pancreatic head (but not in spinal cord) changed during the CRT delivery. Changes from the 1st to the 26th fraction in MCTN ranged from -15.8 to 3.9 HU with an average of -4.7 HU (p<0.001). Meanwhile the volume decreased, the skewness increased (less skewed), and the kurtosis decreased (less peaked). The changes of MCTN, volume, skewness, and kurtosis became significant after two weeks of treatment. Patient pathological response is associated with the changes of MCTN, SD, and skewness. In cases of good response, patients tend to have large reductions in MCTN and skewness, and large increases in SD and kurtosis.Significant changes in CT radiomic features, such as the MCTN, skewness, and kurtosis in tumor were observed during the course of CRT for pancreas cancer based on quantitative analysis of daily CTs. These changes may be potentially used for early assessment of treatment response and stratification for therapeutic intensification

Directory of Open Access Journals

Functional Analysis of preA in Aeromonas veronii TH0426 Reveals a Key Role in the Regulation of Virulence and Resistance to Oxidative Stress

Author: Aidong Qian
Bintong Yang
Chunfeng Wang
Collins
Dingjie An
Dongxing Zhang
Guiqin Wang
Haichao Song
Lin
Sayed Haidar Abbas Raza
Xiaofeng Shan
Yuanhuan Kang
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref

Outlier Eliminating Method of Gyro Array Data in Dynamic and Non-linear Condition

Author: Binhan Du
Jinlong Song
Li Wanli
Liu Mingyong
Lu Y L
Pang Bo
Sang Deyi
Shim D S
Wang Dingjie
Xiao Shuchen
Ye Yan
Zhiyong Shi
Zhu X J
Zhu Zhanlong
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

Patient characteristics, treatment methods and outcome data.

Author: Beth Erickson (4063759)
Cheng Zheng (227957)
Diane Schott (4063753)
Dingjie Li (4063744)
Hui Wu (219526)
Kiyoko Oshima (4063762)
Paul Knechtges (4063756)
William Hall (4063747)
X. Allen Li (4063750)
Xiaojian Chen (2657584)
Yalan Tao (4063741)
Yingqiu Song (4063738)
Publication venue
Publication date
Field of study

<p>Patient characteristics, treatment methods and outcome data.</p

FigShare

The correlation of the change and initial value of MCTN.

Author: Beth Erickson (4063759)
Cheng Zheng (227957)
Diane Schott (4063753)
Dingjie Li (4063744)
Hui Wu (219526)
Kiyoko Oshima (4063762)
Paul Knechtges (4063756)
William Hall (4063747)
X. Allen Li (4063750)
Xiaojian Chen (2657584)
Yalan Tao (4063741)
Yingqiu Song (4063738)
Publication venue
Publication date
Field of study

<p>The straight line is the best linear fit.</p

FigShare

Comparisons of the average changes of for the good- and poor-response groups.

Author: Beth Erickson (4063759)
Cheng Zheng (227957)
Diane Schott (4063753)
Dingjie Li (4063744)
Hui Wu (219526)
Kiyoko Oshima (4063762)
Paul Knechtges (4063756)
William Hall (4063747)
X. Allen Li (4063750)
Xiaojian Chen (2657584)
Yalan Tao (4063741)
Yingqiu Song (4063738)
Publication venue
Publication date
Field of study

<p>(a) moments of histogram including MCTN, SD, skewness, kurtosis, and (b) volume in GTV from the first RT fraction during the course of CRT. The error bar is the standard error of the values of the cohort.</p

FigShare